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Abstract Flux-limited and volume-limited galaxy samples are constructed from SDSS 
data releases DR4, DR6 and DR7 for statistical analysis. The two-point correlation func- 
tions £(s), monopole of three-point correlation functions (q, projected two-point corre- 
lation function w p and pairwise velocity dispersion o\2 are measured to test if galaxy 
samples are fair for these statistics. We find that with increment of sky coverage of SDSS, 
£ (s) of flux-limited sample is extremely robust and insensitive to local structures at low 
redshift. But for volume-limited samples fainter than L* at large scales s >~ 10 /i _1 Mpc, 
deviation of £(s) and frj of DR7 to those of DR4 and DR6 increases with larger absolute 
magnitude. In the weakly nonlinear regime, there is no agreement between frj of different 
data releases in all luminosity bins. Furthermore, w p of volume-limited samples of DR7 
in luminosity bins fainter than — M ri o.i = [18.5, 19.5] are significantly larger, and <7i2 of 
the two faintest volume-limited samples of DR7 display very different scale dependence 
than results of DR4 and DR6. Our findings call for cautions in understanding cluster- 
ing analysis results of SDSS faint galaxy samples, and higher order statistics of SDSS 
volume-limited samples in the weakly nonlinear regime. The first zero-crossing points of 
£(s) of volume-limited samples are also investigated and discussed. 

Key words: galaxies: distances and redshifts — galaxies: statistics — cosmology: obser- 
vation — cosmology: large-scale structure 



1 INTRODUCTION 

Clustering analysis of galaxy samples thrives for the availability of modern massive galaxy surveys. 
The two mostly su ccessful and biggest galaxy surveys to date are the two-degree fi eld galaxy redshif t 
survey (2dFGRS, IColless et all |2003|) and the Sloan Digital Sky Survey (SDSS, lYork et all |2000|) . 
The final data release of the 2dFGRS offers 3-D mapping of roughly a qu arter of million galaxies, the 
SDSS achieves spectra of ~ 0.9 million galaxies dAbazaiian et al.L 120091) . The unprecedented number 
of galaxies and enormous volum e surveyed by SDSS defines its unique role in the era of precision 
cosmology dKomatsu et all 20101) , by its power spectra and the two-point correlation functions (2PCF ) 



at large scales (e.g. lTegmark et all 12004 lEisenstein et all f2005; Per cival et alll2010tlReid et allhoid) 



Another highly appreciated application of clustering analysis of galaxies is to relate galaxy distribu- 
tion to dark matter and halos, aiming at inferring processes galaxies experienced during their formation 
and evolution. Interpretation of statistics of galaxy samples provided by SDSS prevails in category of 
the ACDM+halo model and relevant extensions s uch as the halo occ upation distribution (HOD, e.g. 
iBerlind & Weinberd, 120021: iKravtsov etail |2004 IZheng et all 120051) and the cond i tional luminosity 



function (CLF, lYang et al 



2003b. For example, works of IZehavi et alj d2002l l2005l |2010|) systemati- 



cally explored the luminosity and color dependence of galaxy 2PCFs and extensively quantified HOD 
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parameters of galaxies; ICooravl d2006h derived the occupation of central and satellite galaxies in ha- 
los and their corresponding conditional luminosity functions from a compilation of correlation func- 
tions of SPSS , attempting to draw clues of galaxy evolution with reference to high redshift samples; 
iLi et al.l d2007l) rather directly compared projected correlation functions and the pairwise velocity dis- 
persion (PVD) of SDSS with th ose of mock galax y sa mples popula t ed fro m N-body simulations by 
semi-analytic models (SAM) of iKang etail (120051) and ICroton et al.l d2006[) . they find that SAM can 
roughly reproduced observed clustering of SDSS galaxies but have to reduce faint satellite fraction in 
massive halos in the prescription of SAM by ~ 30 percent to resolve discrepancies in PVD. 

Yet there are challenges to the fairness of SDSS galaxy samples, i.e. whether galaxy samples of 
SDSS are complete and have enough volume to be a fair representation of the Universe. In fact pru- 
dence in reading ou t physics from measured statistics especially correlation function s has been called , 
iNichol et all d2006l) disclosed that exclusion of the Sloan Great Wall (at z ~ 0.08, iGott et all 120051) 
would change the 2PCF by ~ 40% and the three-point correlation function (3PCF) by as much as ~ 70% 
of the sample defined by the r-band absolute magnitudes —22 < M r o.i < —19. The apparent influence 
of super structures on estimated correlation functions at large scales is somehow against intuition since 
one already takes it for granted that the SDSS galaxy sample's depth and sky coverage is sufficient to 
accomplish homogeneity, sp atial averaging would sup press the variance induced by a particular struc- 
ture in a small patch of sky. ISvlos Labini et al.l d2009l) noticed that the zero-crossing point of 2PCF of 
SDSS main galaxy sample varies with luminosity and sample depth, and anti-corr elation is absen t in the 
mostly recent mea sured 2PCF of SDSS luminous red galaxy (LRG) sample (e.g . lMartmez et al.ll2009l 
iKazin et ail 120 1 ot) . By the extreme-value statistical analysis. lAntal et ail d2009l) purport that either the 
SDSS suffers from severe sample volume dependent intrinsic systematical effects or there is persistent 
density fluctuation not fading away over scales beyond standard ACDM model prediction. 

It is therefore important for one to check the fairness of galaxy samples used in order to endorse the 
confidence of relevant analysis. It is under stood that fairness m eans differently for different statistics and 
also samples constructed in various ways. IZehavi etalJd2010l) have laboriously evaluated finite volume 
effects and impact of super structures, they compared 2PCFs of volume-limited galaxy sub-samples 
in full depth with of the same sub-sample but limited in a smaller volume overlapped with the volume- 
limited sub-sample defined in the luminosity bin one dex lower. Their experiment leads to the conclusion 
that finite volume effects are insignificant for anisotropic and projected 2PCFs in nonlinear regime for 
their sub-samples of luminosity higher than M r o.i = —19. But they then find that including faint 
ga laxies causes weird regulation to 2PCF, which is similar but of smaller amplitude to the discovery 
in IZehavi et alj d2005l) using an early release of SDSS. We notice that such analysis for galaxies of 
luminosity lower than —18 is missed though 2PCF of their faintest sub-sample M r .o.i G [—18, —17] is 
adopted for estimation of biasing and HOD parameters. 

These works mainly concentrate on changes to two-point statistics by altering sample depth, we 
rather check the fairness by sky coverage enlargement, not only of 2PCFs but also of monopole of 
3PCFs in redshift space, projected 2PCF s and PVDs. There are data releases 4, 6 and 7 of SPSS main 
galaxy catalogue (PR4, PR6 and PR7 bv lAdelman-McCarthv et~aTll2006ll2008tlAbazaiian et all 120091 
respectively), the increment of sky coverage from PR4 to PR6 is roughly the same from PR6 to PR7. 
An advantage to investigate effects of sample volume on correlation function by sky angular coverage 
other than survey depth is that the restriction of apparent magnitudes of the survey definition limits 
permitted range of depth adjustment, in particular for those faint galaxies which are visible only at 
low redshift and support very shallow sample space. And one of our purpose is to see how correlation 
function evolves naturally with the progress of a real survey. 

Section 2 describes SPSS data and estimation methods of statistics we used, results are shown in 
section 3, the last section is of summary and discussion. 
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Fig. 1 Left panel shows the definition of SDSS galaxy subsamples on the redshift-absolute 
magnitude plane, the two curves are boundaries of VAGC catalogue resulted from the appar- 
ent magnitude limits imposed, those overlapping rectangles delineate where volume-limited 
samples located, and dashed lines label the lower redshift cuts of our flux-limited samples. In 
the right panel, distributions of galaxies of volume-limited subsample — M r o.i = [17, 18] on 
celestial sphere are plotted, green points are galaxies of SDSS DR4, red points indicate extra 
galaxies in DR6 and black point are galaxies added in DR7. 



2 GALAXY SAMPLES AND ESTIMATION OF CORRELATION FUNCTIONS 
2.1 sample construction 

The safe galaxy s ample of the New York University Value-Added Galaxy Catalog (NYU- VAGC, 
iBlanton et al.l 120051) Q is a catalog of low redshift galaxies (mostly below z ~ 0.3) defined by appar- 
ent magnitudes of 14.5 < m r < 17.6. Three data releases in chronological order are selected, namely 
DR4, DR6 and DR7, which spectroscopic surveyed areas are about 4,783, 6,860 and 8,032 square 
degrees respectively. As spectroscopic coverage of SDSS is not uniform, we use only those regions 
of spectroscopic completeness greater than 0.9. We did not perform fibre collision correction to im- 
prove completeness , the correction only becomes significant at scales < 0.2/i _1 Mpc for SDSS galaxies 
(IZehavi et al.ll2002l) . To ensure the correct geometry, galaxies in the three catalogues are also filtered 
with their own accompanied survey windows, bright star masks and completeness masks. 

Flux-limited samples defined by r-band apparent magnitude range 14.5 < m r < 17.6 and redshift 
0.01 < z < 0.23 are generated. Consequently we obtain 300, 661 galaxies in DR4, 447, 407 in DR6, and 
535, 845 in DR7. In order to explore influence of local galaxies on correlation functions, we also con- 
structed confined flux-limited galaxy samples by near-end redshift cut of z m in = 0.037, 0.046, 0.071. 
Volume-limited sub-samples are also produced in consecutive luminosity bins starting from A/ r .o.i = 
— 17 to —22.5 in step of 0.5 magnitude and bin width of one magnitude, the absolute magnitude in NYU- 
VAGC is corrected to redshift z = 0.1 and is K corrected, but e-correction is not taken into account. 
We noticed that there are some galaxies having different apparent magnitudes in DR7 than in early 
data releases, so we constrcuted a couple of additional volume-limited samples from DR7 but filtered 
with masks of DR4 for comparison, measurements indicate that such differences have little influence on 
statistics employed. 

Details of these samples are shown in Tables Q] & [2] and Figure [TJ co moving distances of galaxies 
are calculated in a flat ACDM universe with fi m = 0.3, = 0.7 and h — 0.7. 

1 |http : /T sdss . phys ics ■ nyu ■ edu/ vagc| 



4 



Meng et al. 



Table 1 Numbers of galaxies in flux limited samples defined by r-band apparent magnitude 
14.5 < m r < 17.6 andredshift z m - m < z < 0.23. 





0.010 


0.037 


0.046 


0.071 


DR4 
DR6 
DR7 


300,661 
447,407 
535,845 


281,400 
417,426 
498,445 


268,247 
397,543 
473,980 


216,373 
321,915 
382,921 



Table 2 Volume limited samples. Distances are in in unit of h x Mpc. 



Label 


Luminosity 


redshift 


comovins; 


distance 


number of galaxies 


Mr, o.i — 51og 10 


h Zmin 


Zmax 






DR4 


DR6 


DR7 


VL1 


-18.0, -17.0 


0.011 


0.029 


33.89 


87.31 


4,223 


6,389 


8,219 


VL1+ 


-18.5,-17.5 


0.014 


0.037 


42.53 


108.95 


7,292 


11,543 


14,343 


VL2 


-19.0, -18.0 


0.018 


0.046 


53.32 


135.61 


11,639 


18,328 


22,500 


VL2+ 


-19.5, -18.5 


0.022 


0.057 


66.77 


168.27 


19,209 


29,463 


35,932 


VL3 


-20.0, -19.0 


0.028 


0.071 


83.51 


207.96 


31,807 


47,565 


57,363 


VL3+ 


-20.5, -19.5 


0.035 


0.087 


104.24 


255.70 


50,719 


75,162 


89,654 


VL4 


-21.0, -20.0 


0.044 


0.107 


129.83 


312.96 


59,215 


87,295 


103,924 


VL4+ 


-21.5, -20.5 


0.054 


0.131 


161.21 


381.91 


60,132 


89,602 


107,207 


VL5 


-22.0, -21.0 


0.068 


0.160 


199.41 


462.71 


46,264 


69,499 


82,239 


VL5+ 


-22.5, -21.5 


0.083 


0.194 


245.46 


555.88 


24,002 


36,677 


43,631 



2.2 Estimation of correlation functions 

2.2.1 Redshift space correlation functions 

Isotrop ic 2PCF £ (s) of separation s in redshift space is measured with the estimator of lLandv & Szalavl 
d 19931) . 

DD-2DR + RR 

? ~ RR ' () 

in which DD, RR and DR are respectively the normalised numbers of weighted galaxy-galaxy, 
random-random and galaxy-random pairs at given separation. To proceed the estimation with Eq. [U 
corresponding random sample is generated following distributions of redshift, magnitude, geometric 
constraints, spectroscopic completeness and survey masks of each individual galaxy sample but with 20 
times of numbers of points. Each galaxy and random point is assi gned a weight according to their re d- 
shift and angular position to minimize the variance in estimated £ (lEfstafhioull 1988t iHamiltonl 1 1 993b . 

(2) 



l + 47rn(»$i Js{s) 



where $i is the selection function at the location of ith galaxy, n(z) is the mean number density, and 
Js(s) = Jq £(s)s 2 ds. The Ja(s) is computed using a power-law £(s) with correlation length s = 
8 h^Mpc and 7o = 1.2 dZehavi et al.1120021) . 

Calculation of 3 PCFs of all those gala xy samples lasts too long, we turn to measure the monopole 
of the 3PCF instead dPan & SzapudiLl2005h . which is a degenerated version of 3PCF defined as 

CoOi, s 2 ) = 2tt y C(si,s 2 ,0)dcos6) , (3) 



and estimated via 

DDD - 3DDR + 3DRR - RRR 
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where combined symbols of D and R are normalized numbers of triplets counted within and between 
data sets of galaxies and random points, e.g. if the number of galaxies around galaxy i in bin (s l °, s^ 1 ) 
is rii(si), in bin (s 2 °, s 2 ) is ni(s 2 ), the DDD in Eq.|4]reads 



J2iJl "iQl)"i(s2) 

JV 9 (JV g -l)(Af s -2) 



if si ^ s 2 



ddd = { HT v r;r\ 9 ~ ' . (5) 



J2iJi ni{s 1 )(n i {s 2 )-l 
N g (N g -l)(N s -2 



, if si = s 2 



2.2.2 Projected 2PCF and PVD 



To minimize the effect of redshift distortion due to galaxy's peculiar motion, the separation s (or r in 
real space) is divided into two components, the parallel part ir and the perpendicular part a with respect 
to line-of-sight, the anisotropic 2PCF is measured on grids of (a, it). Integration of £(<j, n) over ir then 
yields a redshift distortion free function, the projected 2PCF, 



/-t-f m ax 
£(<j, Tr)dn = ^ £(<r, tt^Att,; , (6) 

which has practically an integration limit 7r maa; = 50 /i _1 Mpc. 

It is well known that the redshift distortion consists of two components dominated in different 
regimes, coherent infall is responsible for the clustering enhancement at large scales while the smearing 
of correlation strength at small scales is attributed to random m otions. At large scales the boos t to the 
2PCF by the peculiar velocities takes a particularly simple form dKaiserl fl 9871 : iHamiltonl 1 1 992l) . 

Z'(a,ir) =Z (s)P ((j,)+&(s)P 2 ((J,)+U(s)Pi(ri , (7) 

where Pi(fJ.) is Legendre polynomials, // is the cosine of the angle between r and it. Assuming £ = 
( r / r o) ~ 7 there are relations 

6(«)=e(«)=(l + Y + yW) 

6( ,)=(M + M!V^V) (8) 



3 7 7 V7-3 
8/3 2 ( 7 (2 + 7 ) \ 

where /3 ps Qq - 6 /fe, and b is the linear bias parameter, note that the first equation is independent on the 
functional form of £(r). 

To incorporate effects of random motion, the anisotropic 2PCF in redshift space is approximated 
by a convoluti on of £'(<r, it) in Eq.|7]with the distribution function of the pairwise velocity f{v\2) (c.f. 
|PeeblesL[T99l . 

/+°° „, 
Z>(ar,n--^-)f(v 12 )dv 12 , (9) 
-oo "0 

and in general f(vi 2 ) is assumed to obey an exponential distribution with PVD a\ 2 

/M = -^«p(-^l . do) 



C12 



V2 V 0-12 



The parameter j3 usually is derived from the ratio of £(s) to £(r) at large scales via the first equation 
in Eq.[8] then other model parameters can be determined by combining Eq.|7]- Eq.[l0]to fit the £(<t, it) 
data grids. Note that ljing et al.l (1 1998b assumed a sligh tly different exponential distribution function of 
pairwise velocity which is followed bv lLi et aT l d2007h . 
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1 10 100 1 10 100 1 10 100 1 10 100 

s/h" 1 Mpc s/h 1 Mpc s/h'Mpc s/h" 1 Mpc 



Fig. 2 Redshift space 2PCFs of flux-limited samples. 
2.2.3 Covariance Matrix 

Cova riance matrices our results are computed with the jack-knife technique (lLuptonl[l993l : IZehavi et all 
l2002h . Each galaxy sample is divided into twenty separate slices of approximately equal sky area, then 
we perform the analysis twenty times, at each time leave a different slice out. Covariance matrices are 
generated accordingly with these twenty measurements, for instance, covariance of 2PCF measured in 
two bins of i and j is simply 

7V-1 N 

Cov(&,&) = J2(k* - - fc) , (ID 

in which N = 20 is the number of jack-knife sub-samples we used. 

3 RESULTS 

3.1 Flux-limited samples 

Isotropic 2PCFs of flux-limited samples in Table Q] are calculated firstly. Figure [2] manifest that the 
redshift space 2PCFs of flux-limited samples show little variation against data versions of SDSS. £(s) 
of DR4 exhibits some deviation at large scales ~ 100 /i _1 Mpc, but is hardly significant for the huge 
cosmic variance at these scales. £(s) of flux-limited samples of the same data release are displayed in 
the right panel of Figure [2] there is no visible change to redshift space 2PCF of SDSS when galaxies 
with low redshift are excluded even when the lost of number of galaxies is as much as ~ 25% (Table[T|i. 
Thus eliminating local volume and enlarging sky coverage from DR4 to DR7 have little influence on 
the clustering strength measured, unlikely there is any significant sample volume dependent effects. As 
we are not interested in general discussion of the SDSS main galaxy catalogue as a whole, we stop 
preforming further analysis with other statistical measures. 

It is well known that faint galaxi es have much lower linear bias than luminous ones (e.g. 
iTegmark et all 12004 IZehavi et al.L l2005blLi et all 120071) . when we throw away many faint galaxies by 
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imposing near-end redshifts limit it is expected that £(s) should display higher amplitude when lower 
redshift cuts increase. It could be that the lost in number of galaxies (after proper weighting) is too small 
to raise any serious deviation (Table[T|i, or in another words £(s) of flux-limited sample is dominated by 
galaxies around the redshift distribution peak. 

3.2 Volume-limited samples 

3.2.1 2PCF and monopole of 3PCF in redshift space 

2PCFs £(s) and monopoles of 3PCFs (,q{s\,S2) of volume-limited samples of the three SDSS data 
releases are measured to probe possible differences. In this paper we only present t he Co(si = s-z) which 
amplitude is the strongest among configurations of (s\, s?) dPan & Szapudill2005l) . As seen in Figure[3] 
in nonlinear regime major discrepancies appear in the VL1 sample of the lowest luminosity, differences 
between results of DR4 and DR7 are around 2tr at scales as small as ~ 3/i _1 Mpc, while consistency of 
£(s) and £o of brighter volume-limited samples of SDSS is perfect at scales s < 10 /i _1 Mpc. 

At scales greater than 10 /i _1 Mpc, for subsamples of VL3+ - VL5+ £(s) of different data releases 
are in good agreement within errorbars, but (o have variations at level of ~ la (Figure 0). For the 
five faint galaxy samples of VL1 - VL3, disagreement in £(s) of DR7 to DR4 is already apparent in 
the regime, which is confirmed by their (o- We conclude that modulation to correlation functions in 
redshift space resulted from enlargement of sky coverage mainly occurs at scales ranging roughly from 
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Fig.4 2PCFs £(s) and monopoles of 3PCF (o( s i = s 2) at large scales in redshift space of 
volume-limited samples. 



~ 10 to ~ 50 h~ 1 Mpc where is usually classified as weakly nonlinear regime in structure formation 
theory. Those applications and conclusion appear somehow suspicious based on 3PCFs of volume- 
limited samples of SDSS at large scales. For three-point correlation functions in redshift space, fairness 
of volume-limited samples is guaranteed only at small scales, i.e. in strongly nonlinear regime. 

3.2.2 The First Zero-crossing points of2PCFs 

To investigate the charge of lSvlos Labini et al. 1 (120091) the first zero crossing scales of £(s) against me- 
dian luminosity of volume-limited samples are plotted in Figure[5] Estimated ^ (s) is effectively averaged 
over a scale bin [s l ° , s hl ] and the quoted scale is set to be s = V s lo s hl . Unlikely we can right hit all 
zero points of £(s) by our scale binning, so we choose to show the range of scales within which £(s) 
experiences zero-crossing which is drawn as errorbars over the geometric mean of the pair of scales. 
From Figure [5] it is clear that in general the brighter is the characteristic luminosity of the sample, the 
larger the first zero crossing scale will be. The five faint volume-limited samples (VL1 - VL3) have 
roughly the same first zero crossing scale with mild variation between ~ 30 — 50/i _1 Mpc, then the 
crossing scale ascends abruptly to as large as more than 100 /i _1 Mpc and even higher than the largest 
scale we measured (~ 170 /i _1 Mpc). 

For faint galaxy samples, their depths are typically sm all and so be the i r effective volumes, th e 
systematical effect of integral constraint can not be ignored dLandv & Szalavt[i993HBernsteinlll994l) . 
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Fig. 5 Luminosity dependence of the first zero crossing scales of £(s) of volume-limited 
samples. Lower caps of error bars are scales where £ > and higher caps of error bars are 
the adjacent scales where £ immediately becomes negative, those points showing with only 
lower caps denote that the first zero crossing point is actually larger than the scale probed in 
this work, larger than ~ 170 h~ 1 Mpc. 



In the weak correlation limit, the cosmic bias resulted from integral constraint can be approximated by 

h = | - 1 « -^p if Ifl. \m\ « 1 , d2) 

in which £ is the estimated 2PCF, R is the smallest size of the samp le and £ (R) is the avera ge of the 
2PCF over the sample volume, i.e. density variance at sample volume dLandv&SzalavLll993h . There is 
no a priori correction method to this bias unless we assume something to model the shape of the 2PCF. 
Since £ is positive, naturally £ w £ — £(i?) will have a smaller first zero-crossing scale than £. If as 
usual we assume that galaxy bias b is linear and scale independent, £ = 6 2 (£ — £(-R)), the correction 
to the first zero-crossing scale depends on sample volume only. As £ decreases with scale slowly, it is 
expected that the first zero-crossing scales of faint galaxy samples will gradually become larger when 
sample volume increases, which is true for VL2 - VL3. However, surprisingly, it is not for VL1 and 
VL1+. The faintest two subsamples have the smallest sample volume, but the first-zero crossing scale of 
VL1 does not change when SDSS marching from DR4 to DR7, while of VL+ the scale of DR7 becomes 
smaller than of DR4. Furthermore, the difference between depths of VL3 and VL3+ is not very large 
(Table |2), but the first zero-crossing scales of their £ differ hugely. Integral constraint alone could not 
explain the findings. 

The increment of sky coverage from DR4 to DR6 is approximately the same as the gain from DR6 
to DR7, the first zero crossing scales of DR7 only differ from DR6 slightly in two luminosity bins, 
while DR4 does not agree with other data releases significantly, which makes it difficult to clutch at 
other simple geometric explanation, such as assuming fractal galaxy distribution. Ergodicity bias could 
not be r esorted too, for low l uminosity samples with low characteristic redshift, the correction A£ is 
positive dPan & Zhand, 1201 Oh and would push zero point to larger scales, which obviously contradicts 
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observation. Neither could be redshift distortion, as on large scales redshift distortion acts on galaxy 
2PCF as a multiplication. 

The sudden change of the first zero-crossing scale from faint galaxies to bright galaxies probably 
implies that the composition of faint galaxy samples is very different compared with bright galaxy sam- 
ples, which may be attributed to the shi fting of leading role from satellite galaxies to central galaxies 
in samples brighter than — M r .0.1 > 20 dLi et al.U2007l) . Whatever the physical mechanism is, mathe- 
matically the effect to 2PCF is fully packed into a simple function, the galaxy bias. The linear biasing 
model assumes that on large scales the galaxy 2PCF £ g = 6£ m in which b 7^ is a deterministic, scale 
independent bias parameter and £ m is the 2PCF of dark matter, obviously if the model holds, the zero 
point of £ g will not change no matter what b could be, e.g. scale dependent. If we presume that the 
problem of zero crossing is in biasing, then either stochastic or non linear bias has to be invo ked. Simple 
calculation indicates that if we adopt the parametrization to bias of dFrv & Gaztanagall 19931) and include 
the second-order bias parameter in 2PCF, to the leading order the effect is again multiplicative and can 
not shift the first zero point of 2PCF. It appears that stochastic biasing have to be considered. Details of 
the calculation however is beyond scope of this paper and will be presented elsewhere. 

Another interesting aspect is that the first zero-crossing scales 2PCFs of samples VL4 and VL4+ 
of DR4 are larger than the largest scale of our measurements, but not of DR6 and DR7. The lack of 
anti-correlation in the two lumi nosity bins of DR4 i s pr obably an evidence of the modulation due to the 
Sloan Great Wall as revealed bv lZehavi et al.l d2005[) and lNichol et al.l d2006l) . the increased sky cove rage 
of DR6 and DR7 just successfully weakens the influence of the super structure dZehavi et alil2O10h . 

3.2.3 Projected 2PCF and PVD 

£(s) is a mixture of real space 2PCF and PVD. The entanglement can be sorted with the projected 2PCF 
w p . Meas urements of w r are s hown in Figure[6] actually we cross checked our w p of DR7 with available 
results of IZehavi et al.l d2010l) . the agreement is excellent except for the sample VL2 of which our w p 
differs at scales a >~ AMpc/h. As seen in Figure|6l obviously that w p of DR4 and DR6 are in good 
agreement at scale range probed in most luminosity bins, w p of DR6 are slightly larger at large scale 
around a ~ 10 /i _1 Mpc in several faint samples but of low significance for the size of error bars. For 
VL1 and VL2, their w p of DR7 are boosted by more than 70% in amplitude relative to of DR4, but the 
shape does not change. For subsamples in other luminosity bins, their w p are stable against data version, 
though for VL1+ and VL2+ there are some minor changes within errorbars. 

Figure [7] demonstrates the scale dependence of PVDs <j\2 of different luminosity sam- 
ples while Figure [8] is of the luminosity dependence of PVDs measured at scales of a = 
0.27, 0.87, 2.7, 8.7 fr^Mpc respectively. a 12 of subsamples VL1, VL2 and VL2+ of DR7 are signifi- 
cantly different to measurements of DR4. For VL2+, PVD of DR7 agrees with earlier data at small scales 
but then turns to be higher at scales a > l/i _1 Mpc which makes the scale dependence very weak; for 
VL2, (T12 of DR7 roughly keeps the shape of DR4 but has a much larger amplitude. o\2 of VL1 of DR7 
has steeper scale dependence and stronger amplitude at small scales than results of DR4 and DR6. o~\i 
of VL1 subsamples of DR4 and DR6 are rather flat, and do not follow the general trend that PVDs of 
galaxy s amples with lo wer luminosities should rise faster at smaller scales (also see PVDs of SAMs in 
Fig. 5 of iLi et all 120071) . but now DR7 reverts VL1 to track. Comparing distributions in celestial sphere 
of galaxies in the lowest luminosity bin of the three SDSS data releases reveals the variation is just 
induced by a big structure locates in area roughly of RA 166° — 188° and DEC of 16° — 26° (Figure 1). 
It is another example of impact of super structure on clustering analysis of LSS in addition to the Sloan 
Gre at Wall. 

ILi et al.l d2007l) realised that w p and PVDs of faint volu me-limited s a mples of DR4 are too low to 
match prediction of SAMs. Instructed by the experiment of ISlosar et alj d2006l) . they reduced fraction 
of satellite galaxies in massive halos in SAMs ad hoc by around 30% and reproduced approximately the 
actual measurements, which then becomes a serious conflict for people to r econcile betwee n models and 
observation. An eyeball check of our results with the SAMs prediction in ILi et al. denotes that 

the amplitude boost in w p and PVDs of DR7's faint volume-limited samples roughly compensate for the 
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Fig. 6 Projected 2PCFs of the volume-limited samples. 



space between DR4 and SAMs, or at least ameliorate difficulties in theoretical modelling, although we 
do not have SAMs data to quantify the improvement. So unlike the Sloan Great Wall, the existence of a 
large structure in the Universe is actually positive to our working models, which somehow casts doubts 
in the proclaimed practice of cutting off super structures from original data to fit into an unified picture. 
After all, it is still early to say which is closer to the true clustering property of those faint galaxies, we 
might need a deeper and wider survey than the present SDSS DR7 to reach good fairness and reduce 
huge uncertainties. 

4 SUMMARY AND DISCUSSION 

By extensive comparison of different data releases of SDSS main galaxy catalogue with 2PCFs in red- 
shift space for flux-limited samples, 2PCFs/monopole of 3PCFs in redshift space for volume-limited 
samples, projected 2PCFs and PVDs for volume-limited samples, we have the following findings about 
galaxy clustering properties against the expansion of sky coverage of SDSS. 

1. 2PCF £(s) in redshift space of flux-limited sample is extremely robust against sample volume 
change, which subsequently secures relevant application; £(s) is also insensitive to local structures 
at low redshift. 

2. 2PCFs in redshift space £(s) of volume-limited samples of SDSS DR7 in luminosity bins brighter 
than — Mr o.i = [17, 18] are in good agreement with earlier data releases at scales s <^ 
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Fig. 7 The scale dependence of pairwise velocity dispersions in the volume-limited samples. 



10/i _1 Mpc. As scale being larger, the consistency is broken for volume-limited samples fainter 
than — M r fl.i = [19.5, 20.5], and in general the deviation of DR7 to DR6 and DR4 grows with 
larger absolute magnitude. Zero crossing points of DR7's £(s) do not differ much to DR6's, but 
shifts away from DR4's apparently. 

3. Volume-limited samples of SDSS display convergence in Co at scales s <^ 10 /i _1 Mpc except 
the one in the faintest luminosity bin; while in the weakly nonlinear regime, there is no agreement 
between £o of different data releases in all luminosity bins. 

4. Projected 2PCFs w p of volume-limited samples in luminosity bins brighter than — M,-.o.i = 
[18.5, 19.5] are robust against data version, but for samples in fainter bins, w p of DR7 are sig- 
nificantly higher than those of earlier data. Similar phenomenon is also seen in PVDs, PVDs of the 
two faintest volume-limited samples also appear much steeper along scale in DR7 and then become 
flatter at higher luminosity, which actually turn to be closer to what SAMs predict shown in lLi et all 
(120071) . 

5. The faintest volume-limited sample of — M r .o.i = [17, 18] is very peculiar, it suffers of the biggest 
variance due to enlargement of sky coverage, agreement of £(s) and Co of DR7 in redshift space 
with early data is breached at scales as small as ~ 3/i _1 Mpc; w p of the sample is enhanced by 
around ~ 70% and PVDs distinguish much in amplitude and scale dependence from measurements 
of earlier data. 
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Fig. 8 The luminosity dependence of pairwise velocity dispersions at fixed scales. 



Fairness of a galaxy sample is assessed by statistical functions, one can not claim a general fair 
sample hypothesis without specifying the statistical measure used. It is possible that a galaxy sample is 
fair for one statistical function but not for another function. With our measurements, we conclude that 
current SDSS is not able to provide reliable 2PCFs (both of redshift space and projected) and PVDs of 
samples of characteristic luminosity fainter than L*, and third-order statistics in the weakly nonlinear 
regime for nearly all volume-limited samples. 

For faint volume-limited subsamples, probably due to their very shallow depths, measurements suf- 
fers of greater finite volume effects such that enlarging sky coverage has larger influence on measure- 
ments of statistics than for bright subsamples. The inconsistency observed is manifestation of cosmic 
variance due to insufficient sample volume. The variances are comparable to the la jac k-knife error- 
bars w hich usually are regarded as good and robust approximation to the true errorbars dZehavi et al.1 
|2002|) . Now it seems that the technique underestimates the true variance, corresp onding results a bout 
the habitation of fa int galaxies in halos withdrawn from clustering analysis, e.g. iLi et al.l d2007b and 
IZehavi et al.l (|2010|) are not very con crete. Conc l usion s abou t faint galaxies utilizing galaxy group cata- 
logue constructed from SDSS DR4 dYang et al.l 120071 120081) might also be problematic, we conjecture 
that a new group catalogue from DR7 may provide a very different paradigm. 

In our analysis PVDs are derived under an general assumption that galaxy pairwise velocities are 
following closely to exponential distribution. The assumption might not b e exact for sat ellite galaxies 
which pairwise velocity distribution can be better described by Gaussian dTinkeri |2007|) . For galaxies 
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of low luminosity, they are mostly likely satellite, the obtained a 12 based on exponential distribution is 
biased and so be the relation of PVDs with galaxy luminosity presented in Figure [8] Nevertheless, our 
PVDs of different versions of VL1 are biased in the same way, the systematical bias will not affect our 
basic conclusion that PVD of VL1 of DR7 is very different to wh at is of DR4. 

R ecently there are several works applying 3PCF of SPSS (e.g. lSefusatti et all 120061: iKulkarni et al.1 
120071 iMarfn et all l2008t iMarfnl l201ol McBride etafl l2Q10h . either to help determining cosmological 
parameters and galaxy biasing or to diagnose models of galaxy formation. Some results are using mea- 
surements of volume-limited subsamples of the SDSS main galaxy catalogues in the weakly nonlinear 
regime, our analysis however points out that one needs to be very cautious in taking relevant conclusions. 

Another problem worthy of more discussion is the first zero-crossing point of 2PCF. Of course, part 
of the problem is induced by finite volume of samples, at least integral constraint is a serious systematics 
to low luminosity galaxy subsamples. But for subsamples with large volume of bright galaxies, absent 
of anti-correlation at large scales is still puzzling. Instead of assaulting validity of ACDM models it is 
probably better to activate stochastic bias in models of galaxy 2PCF. Halo model alone can not solve the 
problem since at large scales 2PCF in halo model boils down to simple multiplication of bias param- 
eter with linear 2PCF of dark matter. In the bucket of parameters of cosmological application, galaxy 
2PCF at large scales by default is fully described by linear bias parameter and 2PCF of dark matter, the 
single bias parameter is largely degenerated with some other parameters, such as the normalization of 
density fluctuation erg and the matter density parameter fi m , it is unclear if present estimation of cos- 
m ological parameters is sig nificantly biased by the ignorance of possible exotic bias (e.g. the proposal 
of lColes & Erdogduil2007h . 
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